我们提出了一个新颖的框架概念,以分析在预训练的语言模型中学习的潜在概念如何编码。它使用聚类来发现编码的概念,并通过与大量的人类定义概念对齐来解释它们。我们对七个变压器语言模型的分析揭示了有趣的见解:i)学习表示形式中的潜在空间与不同程度的不同语言概念重叠,ii)ii)模型中的下层以词汇概念(例如附加物为附加物)为主,而却是核心语言概念(例如形态学或句法关系)在中间和更高层中更好地表示,iii)一些编码的概念是多方面的,无法使用现有的人类定义的概念来充分解释。
translated by 谷歌翻译
尽管在理解深度NLP模型中学到的表示形式以及他们所捕获的知识方面已经做了很多工作,但对单个神经元的关注很少。我们提出了一种称为语言相关性分析的技术,可在任何外部特性中提取模型中的显着神经元 - 目的是了解如何保留这种知识在神经元中。我们进行了细粒度的分析以回答以下问题:(i)我们可以识别网络中捕获特定语言特性的神经元子集吗? (ii)整个网络中的局部或分布式神经元如何? iii)信息保留了多么冗余? iv)针对下游NLP任务的微调预训练模型如何影响学习的语言知识? iv)架构在学习不同的语言特性方面有何不同?我们的数据驱动的定量分析阐明了有趣的发现:(i)我们发现了可以预测不同语言任务的神经元的小亚集,ii)捕获基本的词汇信息(例如后缀),而这些神经元位于较低的大多数层中,iii,iii),而这些神经元,而那些神经元,而那些神经元则可以预测。学习复杂的概念(例如句法角色)主要是在中间和更高层中,iii),在转移学习过程中,显着的语言神经元从较高到较低的层移至较低的层,因为网络保留了较高的层以特定于任务信息,iv)我们发现很有趣在培训预训练模型之间的差异,关于如何保留语言信息,V)我们发现概念在多语言变压器模型中跨不同语言表现出相似的神经元分布。我们的代码作为Neurox工具包的一部分公开可用。
translated by 谷歌翻译
Natiq是阿拉伯语的端到端文本到语音系统。我们的语音合成器使用Encoder-Decoder架构引起了人们的注意。我们同时使用了基于TACOTRON的模型(Tacotron-1和Tacotron-2)和更快的变压器模型来从字符中生成MEL光谱图。我们将tacotron1与Wavernn Vocoder,Tacotron2与WaveLow Vocoder和ESPNET变压器与平行波甘gan vocoder串联,以从频谱图合成波形。我们使用了两个声音的内部语音数据:1)中立的男性“ hamza” - 叙述一般内容和新闻,以及2)表现力的女性“ Amina” - 叙述孩子的故事书来训练我们的模型。我们的最佳系统的平均平均意见评分(MOS)分别为Amina和Hamza的平均意见分别为4.21和4.40。使用单词和字符错误率(WER和CER)对系统的客观评估以及实时因子测量的响应时间有利于端到端体系结构ESPNET。 NATIQ演示可在线上https://tts.qcri.org提供
translated by 谷歌翻译
深层神经网络在各个领域的增殖已经增加了对这些模型的解释性的需求。沿着这条线进行的初步工作,调查了这种调查的论文集中在高级表示分析上。然而,最近的工作分支集中在这些模型中分析神经元的更详细水平上的可解释性。在本文中,我们调查了神经元分析所做的工作,包括:i)在网络中发现和理解神经元的方法,ii)评估方法,iii)主要发现,包括神经元分析已解散的跨架构比较,iv)神经元的应用。探索:控制模型,域适应等,v)关于开放问题和未来研究方向的讨论。
translated by 谷歌翻译
静态嵌入的后处理已成为提高其在词汇和序列级任务上的性能。但是,在上下文化嵌入的后处理是一个研究不足的问题。在这项工作中,我们质疑从不同训练的语言模型获得的上下文化嵌入的后处理的有用性。更具体地说,我们使用Z分数,Min-Max归一化以及使用全而top方法来删除顶部原理组件,将单个神经元激活标准化。此外,我们将单位长度标准化应用于单词表示。在各种预训练的模型集中,我们表明,在表示两个词汇任务(例如单词相似性和类比)和序列分类任务的表示后处理中存在重要信息。我们的发现提出了有关使用上下文表示表示的研究研究的有趣点,并建议在应用程序中使用Z分数归一化作为要考虑的重要步骤。
translated by 谷歌翻译
基于变压器的NLP模型是使用数亿甚至数十亿个参数训练的,从而限制了其在计算受限环境中的适用性。尽管参数的数量通常与性能相关,但尚不清楚下游任务是否需要整个网络。在最新的修剪和提炼预培训模型的工作中,我们探索了在预训练模型中放下层的策略,并观察修剪对下游胶水任务的影响。我们能够修剪Bert,Roberta和XLNet型号高达40%,同时保持其原始性能的98%。此外,我们证明,在大小和性能方面,您的修剪模型与使用知识蒸馏的型号相提并论。我们的实验产生有趣的观察结果,例如(i)下层对于维持下游任务性能最重要,(ii)某些任务(例如释义检测和句子相似性)对于降低层的降低和(iii)经过训练的模型更强大。使用不同的目标函数表现出不同的学习模式,并且层掉落。
translated by 谷歌翻译
Pretrained language models (PLMs) often fail to fairly represent target users from certain world regions because of the under-representation of those regions in training datasets. With recent PLMs trained on enormous data sources, quantifying their potential biases is difficult, due to their black-box nature and the sheer scale of the data sources. In this work, we devise an approach to study the geographic bias (and knowledge) present in PLMs, proposing a Geographic-Representation Probing Framework adopting a self-conditioning method coupled with entity-country mappings. Our findings suggest PLMs' representations map surprisingly well to the physical world in terms of country-to-country associations, but this knowledge is unequally shared across languages. Last, we explain how large PLMs despite exhibiting notions of geographical proximity, over-amplify geopolitical favouritism at inference time.
translated by 谷歌翻译
When people think of everyday things like an "egg," they typically have a mental image associated with it. This commonsense knowledge helps us understand how these everyday things work and how to interact with them. For example, when someone tries to make a fried egg, they know that it has a shell and that it can be cracked open to reveal the egg white and yolk inside. However, if a system does not have a coherent picture of such everyday things, thinking that the egg yolk surrounds the shell, then it might have to resort to ridiculous approaches such as trying to scrape the egg yolk off the shell into the pan. Do language models have a coherent picture of such everyday things? To investigate this, we propose a benchmark dataset consisting of 100 everyday things, their parts, and the relationships between these parts. We observe that state-of-the-art pre-trained language models (LMs) like GPT-3 and Macaw have fragments of knowledge about these entities, but they fail to produce consistent parts mental models. We propose a simple extension to these LMs where we apply a constraint satisfaction layer on top of raw predictions from LMs to produce more consistent and accurate parts mental models of everyday things.
translated by 谷歌翻译
Reliable forecasting of traffic flow requires efficient modeling of traffic data. Different correlations and influences arise in a dynamic traffic network, making modeling a complicated task. Existing literature has proposed many different methods to capture the complex underlying spatial-temporal relations of traffic networks. However, methods still struggle to capture different local and global dependencies of long-range nature. Also, as more and more sophisticated methods are being proposed, models are increasingly becoming memory-heavy and, thus, unsuitable for low-powered devices. In this paper, we focus on solving these problems by proposing a novel deep learning framework - STLGRU. Specifically, our proposed STLGRU can effectively capture both local and global spatial-temporal relations of a traffic network using memory-augmented attention and gating mechanism. Instead of employing separate temporal and spatial components, we show that our memory module and gated unit can learn the spatial-temporal dependencies successfully, allowing for reduced memory usage with fewer parameters. We extensively experiment on several real-world traffic prediction datasets to show that our model performs better than existing methods while the memory footprint remains lower. Code is available at \url{https://github.com/Kishor-Bhaumik/STLGRU}.
translated by 谷歌翻译
Most camera lens systems are designed in isolation, separately from downstream computer vision methods. Recently, joint optimization approaches that design lenses alongside other components of the image acquisition and processing pipeline -- notably, downstream neural networks -- have achieved improved imaging quality or better performance on vision tasks. However, these existing methods optimize only a subset of lens parameters and cannot optimize glass materials given their categorical nature. In this work, we develop a differentiable spherical lens simulation model that accurately captures geometrical aberrations. We propose an optimization strategy to address the challenges of lens design -- notorious for non-convex loss function landscapes and many manufacturing constraints -- that are exacerbated in joint optimization tasks. Specifically, we introduce quantized continuous glass variables to facilitate the optimization and selection of glass materials in an end-to-end design context, and couple this with carefully designed constraints to support manufacturability. In automotive object detection, we show improved detection performance over existing designs even when simplifying designs to two- or three-element lenses, despite significantly degrading the image quality. Code and optical designs will be made publicly available.
translated by 谷歌翻译